30 research outputs found

    Use of Weighted Finite State Transducers in Part of Speech Tagging

    Full text link
    This paper addresses issues in part of speech disambiguation using finite-state transducers and presents two main contributions to the field. One of them is the use of finite-state machines for part of speech tagging. Linguistic and statistical information is represented in terms of weights on transitions in weighted finite-state transducers. Another contribution is the successful combination of techniques -- linguistic and statistical -- for word disambiguation, compounded with the notion of word classes.Comment: uses psfig, ipamac

    GIST-IT: Summarizing Email Using Linguistic Knowledge and Machine

    No full text
    The setting is a small Mississippi River town in the 1830s and the characters are the children and grown ups of the town. Tom Sawyer is the main character and you follow him around during the book. (Kuiper 1122) This book was largely based on Mark Twain\u92s boyhood. The famous whitewashing scene actually happened. Mark was Tom getting other little boys to do his work. He also got lost in that very same cave. Huck Finn was the same way. Huck was based on Twain\u92s boyhood friend/ \u93idol\u94, Tom Blankenship. Tom Sawyer, the main character of the work, is hardly the \u93model boy\u94. He is just like any other boy, mischievous and irresponsible, yet goodhearted. He reminds us all at how we used to be at that age. We did what ever we could to have fun. He is a thirteen year old boy filled with adventures and excitement

    Issues In Text-To-Speech For French

    No full text
    This paper reports the progress of the French tcxt4o-speech system being developed at AT&T Bell Laboratories as part of a larger project for multilingual text-to-speech systems, including languages such as Spanish, Italian, German, Russjam and Chinese. These systems, based on diphone and triphone concatenation, follow the general framework of the Bell Laboratories English TTS system [?], [?]. This paper provides a description of the approach, the current status of the French text-to-peech project, and some problems particular to French

    La synthèse de la parole et le traitement automatique des langues.

    No full text
    International audienc

    Information Retrieval Based on Context Distance and Morphology

    Get PDF
    We present an approach to information retrieval based on context distance and morphology. Context distance is a measure we use to assess the closeness of word meanings. This context distance model measures semantic distances between words using the local contexts of words within a single document as well as the lexical co-occurrence information in the set of documents to be retrieved. We also propose to integrate the context distance model with morphological analysis in determining word similarity so that the two can enhance each other. Using the standard vector-space model, we evaluated the proposed method on a subset of TREC-4 corpus (AP88 and AP90 collection, 158,240 documents, 49 queries). Results show that this method improves the 11-point average precision by 8.6%

    Using word class for part-of-speech disambiguation

    No full text
    This paper presents a methodology for improving part-of-speech disambiguation using word classes. We build on earlier work for tagging French where we showed that statistical estimates can be computed without lexical probabilities. We investigate new directions for coming up with different kinds of probabilities based on paradigms of tags for given words. We base estimates not on the words, but on the set of tags associated with a word. We compute frequencies of unigrams, bigrams, and trigrams of word classes in order to further refine the disambiguation. This ne
    corecore